61 research outputs found
Bayes and maximum likelihood for -Wasserstein deconvolution of Laplace mixtures
We consider the problem of recovering a distribution function on the real
line from observations additively contaminated with errors following the
standard Laplace distribution. Assuming that the latent distribution is
completely unknown leads to a nonparametric deconvolution problem. We begin by
studying the rates of convergence relative to the -norm and the Hellinger
metric for the direct problem of estimating the sampling density, which is a
mixture of Laplace densities with a possibly unbounded set of locations: the
rate of convergence for the Bayes' density estimator corresponding to a
Dirichlet process prior over the space of all mixing distributions on the real
line matches, up to a logarithmic factor, with the rate
for the maximum likelihood estimator. Then, appealing to an inversion
inequality translating the -norm and the Hellinger distance between
general kernel mixtures, with a kernel density having polynomially decaying
Fourier transform, into any -Wasserstein distance, , between the
corresponding mixing distributions, provided their Laplace transforms are
finite in some neighborhood of zero, we derive the rates of convergence in the
-Wasserstein metric for the Bayes' and maximum likelihood estimators of
the mixing distribution. Merging in the -Wasserstein distance between
Bayes and maximum likelihood follows as a by-product, along with an assessment
on the stochastic order of the discrepancy between the two estimation
procedures
Bayesian adaptation
In the need for low assumption inferential methods in infinite-dimensional
settings, Bayesian adaptive estimation via a prior distribution that does not
depend on the regularity of the function to be estimated nor on the sample size
is valuable. We elucidate relationships among the main approaches followed to
design priors for minimax-optimal rate-adaptive estimation meanwhile shedding
light on the underlying ideas.Comment: 20 pages, Propositions 3 and 5 adde
Empirical Bayes conditional density estimation
The problem of nonparametric estimation of the conditional density of a
response, given a vector of explanatory variables, is classical and of
prominent importance in many prediction problems since the conditional density
provides a more comprehensive description of the association between the
response and the predictor than, for instance, does the regression function.
The problem has applications across different fields like economy, actuarial
sciences and medicine. We investigate empirical Bayes estimation of conditional
densities establishing that an automatic data-driven selection of the prior
hyper-parameters in infinite mixtures of Gaussian kernels, with
predictor-dependent mixing weights, can lead to estimators whose performance is
on par with that of frequentist estimators in being minimax-optimal (up to
logarithmic factors) rate adaptive over classes of locally H\"older smooth
conditional densities and in performing an adaptive dimension reduction if the
response is independent of (some of) the explanatory variables which,
containing no information about the response, are irrelevant to the purpose of
estimating its conditional density
Convergence rates for Bayesian density estimation of infinite-dimensional exponential families
We study the rate of convergence of posterior distributions in density
estimation problems for log-densities in periodic Sobolev classes characterized
by a smoothness parameter p. The posterior expected density provides a
nonparametric estimation procedure attaining the optimal minimax rate of
convergence under Hellinger loss if the posterior distribution achieves the
optimal rate over certain uniformity classes. A prior on the density class of
interest is induced by a prior on the coefficients of the trigonometric series
expansion of the log-density. We show that when p is known, the posterior
distribution of a Gaussian prior achieves the optimal rate provided the prior
variances die off sufficiently rapidly. For a mixture of normal distributions,
the mixing weights on the dimension of the exponential family are assumed to be
bounded below by an exponentially decreasing sequence. To avoid the use of
infinite bases, we develop priors that cut off the series at a
sample-size-dependent truncation point. When the degree of smoothness is
unknown, a finite mixture of normal priors indexed by the smoothness parameter,
which is also assigned a prior, produces the best rate. A rate-adaptive
estimator is derived.Comment: Published at http://dx.doi.org/10.1214/009053606000000911 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
On asymptotically efficient maximum likelihood estimation of linear functionals in Laplace measurement error models
Maximum likelihood estimation of linear functionals in the inverse problem of
deconvolution is considered. Given observations of a random sample from a
distribution indexed by a (potentially
infinite-dimensional) parameter , which is the distribution of the latent
variable in a standard additive Laplace measurement error model, one wants to
estimate a linear functional of . Asymptotically efficient maximum
likelihood estimation (MLE) of integral linear functionals of the mixing
distribution in a convolution model with the Laplace kernel density is
investigated. Situations are distinguished in which the functional of interest
can be consistently estimated at -rate by the plug-in MLE, which is
asymptotically normal and efficient, in the sense of achieving the variance
lower bound, from those in which no integral linear functional can be estimated
at parametric rate, which precludes any possibility for asymptotic efficiency.
The -convergence of the MLE, valid in the case of a degenerate mixing
distribution at a single location point, fails in general, as does asymptotic
normality. It is shown that there exists no regular estimator sequence for
integral linear functionals of the mixing distribution that, when recentered
about the estimand and -rescaled, is asymptotically efficient,
\emph{viz}., has Gaussian limit distribution with minimum variance. One can
thus only expect estimation with some slower rate and, often, with a
non-Gaussian limit distribution
On asymptotically efficient maximum likelihood estimation of linear functionals in Laplace measurement error models
Maximum likelihood estimation of linear functionals in the inverse problem of
deconvolution is considered. Given observations of a random sample from a
distribution indexed by a (potentially
infinite-dimensional) parameter , which is the distribution of the latent
variable in a standard additive Laplace measurement error model, one wants to
estimate a linear functional of . Asymptotically efficient maximum
likelihood estimation (MLE) of integral linear functionals of the mixing
distribution in a convolution model with the Laplace kernel density is
investigated. Situations are distinguished in which the functional of interest
can be consistently estimated at -rate by the plug-in MLE, which is
asymptotically normal and efficient, in the sense of achieving the variance
lower bound, from those in which no integral linear functional can be estimated
at parametric rate, which precludes any possibility for asymptotic efficiency.
The -convergence of the MLE, valid in the case of a degenerate mixing
distribution at a single location point, fails in general, as does asymptotic
normality. It is shown that there exists no regular estimator sequence for
integral linear functionals of the mixing distribution that, when recentered
about the estimand and -rescaled, is asymptotically efficient,
\emph{viz}., has Gaussian limit distribution with minimum variance. One can
thus only expect estimation with some slower rate and, often, with a
non-Gaussian limit distribution
Bayes and empirical Bayes: do they merge?
Bayesian inference is attractive for its coherence and good frequentist
properties. However, it is a common experience that eliciting a honest prior
may be difficult and, in practice, people often take an {\em empirical Bayes}
approach, plugging empirical estimates of the prior hyperparameters into the
posterior distribution. Even if not rigorously justified, the underlying idea
is that, when the sample size is large, empirical Bayes leads to "similar"
inferential answers. Yet, precise mathematical results seem to be missing. In
this work, we give a more rigorous justification in terms of merging of Bayes
and empirical Bayes posterior distributions. We consider two notions of
merging: Bayesian weak merging and frequentist merging in total variation.
Since weak merging is related to consistency, we provide sufficient conditions
for consistency of empirical Bayes posteriors. Also, we show that, under
regularity conditions, the empirical Bayes procedure asymptotically selects the
value of the hyperparameter for which the prior mostly favors the "truth".
Examples include empirical Bayes density estimation with Dirichlet process
mixtures.Comment: 27 page
Wasserstein convergence in Bayesian and frequentist deconvolution models
We study the multivariate deconvolution problem of recovering the distribution of a signal from independent and identically distributed observations additively contaminated with random errors (noise) from a known distribution. For errors with independent coordinates having ordinary smooth densities, we derive an inversion inequality relating the L1-Wasserstein distance between two distributions of the signal to the L1-distance between the corresponding mixture densities of the observations. This smoothing inequality outperforms existing inversion inequalities. As an application of the inversion inequality to the Bayesian framework, we consider 1-Wasserstein deconvolution with Laplace noise in dimension one using a Dirichlet process mixture of normal densities as a prior measure on the mixing distribution (or distribution of the signal). We construct an adaptive approximation of the sampling density by convolving the Laplace density with a well-chosen mixture of normal densities and show that the posterior measure concentrates around the sampling density at a nearly minimax rate, up to a log-factor, in the L1-distance. The same posterior law is also shown to automatically adapt to the unknown Sobolev regularity of the mixing density, thus leading to a new Bayesian adaptive estimation procedure for mixing distributions with regular densities under the L1-Wasserstein metric. We illustrate utility of the inversion inequality also in a frequentist setting by showing that an appropriate isotone approximation of the classical kernel deconvolution estimator attains the minimax rate of convergence for 1-Wasserstein deconvolution in any dimension d≥1, when only a tail condition is required on the latent mixing density and we derive sharp lower bounds for these problems
- …